adaptation policy
Breaking the Activation Function Bottleneck through Adaptive Parameterization
Sebastian Flennerhag, Hujun Yin, John Keane, Mark Elliot
Adaptive parameterization is a means of increasing this flexibility and thereby increasing the model's capacity to learn non-linear patterns. We focus on the feed-forward layer, f(x):= ฯ(W x+b),for some activation functionฯ: R 7 R. Define the pre-activation layer as a = A(x):= Wx+band denote byg(a):= ฯ(a)/athe activation effect ofฯgivena, where divisioniselement-wise.
Climate Adaptation with Reinforcement Learning: Economic vs. Quality of Life Adaptation Pathways
Costa, Miguel, Vandervoort, Arthur, Drews, Martin, Morrissey, Karyn, Pereira, Francisco C.
Climate change will cause an increase in the frequency and severity of flood events, prompting the need for cohesive adaptation policymaking. Designing effective adaptation policies, however, depends on managing the uncertainty of long-term climate impacts. Meanwhile, such policies can feature important normative choices that are not always made explicit. We propose that Reinforcement Learning (RL) can be a useful tool to both identify adaptation pathways under uncertain conditions while it also allows for the explicit modelling (and consequent comparison) of different adaptation priorities (e.g. economic vs. wellbeing). We use an Integrated Assessment Model (IAM) to link together a rainfall and flood model, and compute the impacts of flooding in terms of quality of life (QoL), transportation, and infrastructure damage. Our results show that models prioritising QoL over economic impacts results in more adaptation spending as well as a more even distribution of spending over the study area, highlighting the extent to which such normative assumptions can alter adaptation policy. Our framework is publicly available: https://github.com/MLSM-at-DTU/maat_qol_framework.
Navigating the Synchrony-Stability Frontier in Adaptive Chatbots
Adaptive chatbots that mimic a user's linguistic style can build rapport and engagement, yet unconstrained mimicry risks an agent that feels unstable or sycophantic. We present a computational evaluation framework that makes the core design tension explicit: balancing moment-to-moment linguistic synchrony against long-term persona stability. Using an 8-dimensional style vector and a closed-loop "base+delta" prompting architecture, we simulate and compare explicit adaptation policies - Uncapped, Cap, Exponential Moving Average (EMA), Dead-Band, and Hybrids - on a human-log dataset. Our analysis maps a clear Pareto frontier: bounded policies achieve substantial gains in stability at a modest cost to synchrony. For example, a Hybrid (EMA+Cap) raises stability from 0.542 to 0.878 (+62%) while reducing synchrony by only 17%. We confirm this trade-off through large-scale replications on three public corpora (DailyDialog, Persona-Chat, EmpatheticDialogues) and LLM-in-the-loop validation across two model families. Furthermore, we quantify "prompt legibility," showing that frontier policies reduce instruction churn and cut jarring register flips (major tone changes) from 0.254 to 0.092, yielding systems that are easier to reason about and maintain. Taken together, our framework provides a general evaluation harness for style adaptation; a systematic ablation that identifies Pareto-efficient policies; robust validation across diverse datasets and models; and novel legibility metrics linking policy choices to system maintainability.
AdaptSim: Task-Driven Simulation Adaptation for Sim-to-Real Transfer
Ren, Allen Z., Dai, Hongkai, Burchfiel, Benjamin, Majumdar, Anirudha
Simulation parameter settings such as contact models and object geometry approximations are critical to training robust robotic policies capable of transferring from simulation to real-world deployment. Previous approaches typically handcraft distributions over such parameters (domain randomization), or identify parameters that best match the dynamics of the real environment (system identification). However, there is often an irreducible gap between simulation and reality: attempting to match the dynamics between simulation and reality across all states and tasks may be infeasible and may not lead to policies that perform well in reality for a specific task. Addressing this issue, we propose AdaptSim, a new task-driven adaptation framework for sim-to-real transfer that aims to optimize task performance in target (real) environments -- instead of matching dynamics between simulation and reality. First, we meta-learn an adaptation policy in simulation using reinforcement learning for adjusting the simulation parameter distribution based on the current policy's performance in a target environment. We then perform iterative real-world adaptation by inferring new simulation parameter distributions for policy training, using a small amount of real data. We perform experiments in three robotic tasks: (1) swing-up of linearized double pendulum, (2) dynamic table-top pushing of a bottle, and (3) dynamic scooping of food pieces with a spatula. Our extensive simulation and hardware experiments demonstrate AdaptSim achieving 1-3x asymptotic performance and $\sim$2x real data efficiency when adapting to different environments, compared to methods based on Sys-ID and directly training the task policy in target environments. Website: https://irom-lab.github.io/AdaptSim/
Reinforcement learning based adaptive metaheuristics
Parameter adaptation, that is the capability to automatically adjust an algorithm's hyperparameters depending on the problem being faced, is one of the main trends in evolutionary computation applied to numerical optimization. While several handcrafted adaptation policies have been proposed over the years to address this problem, only few attempts have been done so far at apply machine learning to learn such policies. Here, we introduce a general-purpose framework for performing parameter adaptation in continuous-domain metaheuristics based on state-of-the-art reinforcement learning algorithms. We demonstrate the applicability of this framework on two algorithms, namely Covariance Matrix Adaptation Evolution Strategies (CMA-ES) and Differential Evolution (DE), for which we learn, respectively, adaptation policies for the step-size (for CMA-ES), and the scale factor and crossover rate (for DE). We train these policies on a set of 46 benchmark functions at different dimensionalities, with various inputs to the policies, in two settings: one policy per function, and one global policy for all functions.
A critical look at the current train/test split in machine learning
Tan, Jimin, Yang, Jianan, Wu, Sai, Chen, Gang, Zhao, Jake
The randomized or cross-validated split of training and testing sets has been adopted as the gold standard of machine learning for decades. The establishment of these split protocols are based on two assumptions: (i)-fixing the dataset to be eternally static so we could evaluate different machine learning algorithms or models; (ii)-there is a complete set of annotated data available to researchers or industrial practitioners. However, in this article, we intend to take a closer and critical look at the split protocol itself and point out its weakness and limitation, especially for industrial applications. In many real-world problems, we must acknowledge that there are numerous situations where assumption (ii) does not hold. For instance, for interdisciplinary applications like drug discovery, it often requires real lab experiments to annotate data which poses huge costs in both time and financial considerations. In other words, it can be very difficult or even impossible to satisfy assumption (ii). In this article, we intend to access this problem and reiterate the paradigm of active learning, and investigate its potential on solving problems under unconventional train/test split protocols. We further propose a new adaptive active learning architecture (AAL) which involves an adaptation policy, in comparison with the traditional active learning that only unidirectionally adds data points to the training pool. We primarily justify our points by extensively investigating an interdisciplinary drug-protein binding problem. We additionally evaluate AAL on more conventional machine learning benchmarking datasets like CIFAR-10 to demonstrate the generalizability and efficacy of the new framework.